-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gguf.md: Add GGUF Naming Convention Section #822
gguf.md: Add GGUF Naming Convention Section #822
Conversation
interesting! we could probably parse it on HF side in the future if it makes sense and if it unlocks cool features (we already attempt to extract quantization type from filename but this could make it more robust. cc @mishig25) |
If it helps, we follow somewhat similar (but not exhaustive) in the Standardisation in file name is always a great move! |
Had a quick look. Do you mean your current naming arrangement is kind of like
If so then do you have a preferred form? I came to this form basically by casual observation of typical naming scheme in huggingface, hence But obviously it's research by vibes, so it be better if I had some feedback, especially for those that be forced to try and parse such files. Ergo @JidongZhang-THU would it make it easier for you if we made 'version' not optional? (Expert count being optional is okay as it's easy to tell that x is either there or not there). And of course... did I miss anything that would be useful for people parsing models file names? |
Let me add a bit of backstory as to why we chose this naming scheme (which I'm more than happy to change): A typical user of the quantisation space would want to create quants for an arbitrary model on the Hub. The model name typically already would have the information about expert count + parameters. I'm open to ideas to align better, I just thought I'd provide more context. |
1a02aee
to
43e8e45
Compare
43e8e45
to
1bf1ab5
Compare
@julien-c do you have a preference when it comes to parsing filenames? I'm basically treating it as a sort of |
@mofosyne no, we'll adapt! |
b489520
to
7d9bd43
Compare
Thanks for the historical context. I might have gotten a bit crazy here, but I've ended up mapping each enum name to the tensor type description and the historical context behind each PR that relates to it's initial inclusion... Not even sure if it's allowed on this gguf.md page, so just attaching it to this comment in case I should remove it. But hopefully it helps provide a general glance of each gguf file type. Oh and i've updated the page a bit. Opted for 'tensor type' rather than 'file name' as that appears to make more sense to me at least.
|
@mofosyne, I've made a similar table of quant descriptions in https://huggingface.co/docs/hub/gguf#quantization-types (sharing just in case if there's any useful info) |
@mishig25 thanks. Decided to cross reference your table with what I got and this is the breakdown i was able to figure out. I'm not 100% sure on all the superblock configuration for the i-quantization based on your statement and the llama.cpp PR description, but I was able to extract some out. I think gg would be a clearer source of truth here (especially some of my general assertions below). Encoding Scheme Name Table
|
@mishig25 when you made the table, were you able to figure out the superblocks makeup and how to represent the weight formulae (in general)? Also in your opinion, is this table in the right location or should it be split up (and if so then where)? (And on a meta note... how much information should we really expose in this document... too much can confuse developers) edit: Justine T also mentioned regarding my 'Weights Encoding Scheme' table is that I may have issue using different name for quants than what the software (presumably llama.cpp) uses. So I guess we could say this is not a super hard and fast mapping, but can include other variants... but for the context of ggml this is the base scheme name. llama.cpp can then define their own extra naming (e.g. _S, _M and _L) in their own documentation (as extra pointers for users of what to expect). |
IMO, the entire encoding section should just be reduced to simply:
The rest of the information is specific mainly to |
@ggerganov thanks, looks much more compact and focused now. @mishig25 @julien-c @Vaibhavs10 @Green-Sky let's lock this in? (Wonder if the table with bits, datatype, block config, etc... be useful anywhere, such as llama.cpp documentation and if so then which specific location) |
@ggerganov thanks for the merge I've decided to place the table to https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemes . Turns out github wiki kind of suck at rendering tables, but hope it's of help to everyone here. |
@@ -18,6 +18,43 @@ GGUF is a format based on the existing GGJT, but makes a few changes to the form | |||
|
|||
The key difference between GGJT and GGUF is the use of a key-value structure for the hyperparameters (now referred to as metadata), rather than a list of untyped values. This allows for new metadata to be added without breaking compatibility with existing models, and to annotate the model with additional information that may be useful for inference or for identifying the model. | |||
|
|||
### GGUF Naming Convention | |||
|
|||
GGUF follow a naming convention of `<Model>-<Version>-<ExpertsCount>x<Parameters>-<EncodingScheme>.gguf` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mofosyne great work!
Maybe the only missing information is to add the optional suffix about the shard info.
Example: "grok-1/grok-1-q4_0-00003-of-00009.gguf"
#820
This PR is based on outfile default name generation in ggerganov/llama.cpp#4858, copied from there but removed historical references and justification to why it was designed that way.
Feedback and adjustment will be appreciated. Any changes to this will mean we also need to update llama.cpp default name generation as well.
In addition, is there any filename generation in this repo? If so then we may want to also update it as well to use this common naming scheme.